Lightly supervised training for risk-based discriminative language models

نویسندگان

  • Akio Kobayashi
  • Takahiro Oku
  • Yuya Fujita
  • Shoei Sato
چکیده

We propose a lightly supervised training method for a discriminative language model (DLM) based on risk minimization criteria. In lightly supervised training, pseudo labels generated by automatic speech recognition (ASR) are used as references. However, as these labels usually include recognition errors, the discriminative models estimated from such faulty reference labels may degrade ASR performance. Therefore, an approach to prevent performance degradation is necessary for discriminative language modeling. In our proposed lightly supervised training, the DLM is estimated from a “fused” risk, which is a relaxed version of the conventional Bayes risk. The fused risk is computed in a supervised manner when pseudo labels are accepted as references with high confidence while computed in an unsupervised manner when the labels are rejected due to low confidence. Accordingly, minimizing the fused risk for the training lattices results in a DLM with smoothed model parameters. The experimental results show that our proposed lightly supervised training method significantly reduced the word error rate compared with DLMs trained in conventional lightly supervised manners.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data

This paper introduces a method for lightly supervised discriminative training using MMI to improve the alignment of speech and text data for use in training HMM-based TTS systems for low-resource languages. In TTS applications, due to the use of long-span contexts, it is important to select training utterances which have wholly correct transcriptions. In a low-resource setting, when using poorl...

متن کامل

Discriminative data selection for lightly supervised training of acoustic model using closed caption texts

We present a novel data selection method for lightly supervised training of acoustic model, which exploits a large amount of data with closed caption texts but not faithful transcripts. In the proposed scheme, a sequence of the closed caption text and that of the ASR hypothesis by the baseline system are aligned. Then, a set of dedicated classifiers is designed and trained to select the correct...

متن کامل

Discriminative bilinear language modeling for broadcast transcriptions

A discriminative bilinear language model (DBLM) estimated on the basis of Bayes risk minimization is described. The discriminative language model (DLM) is conventionally trained by using n-gram features. However, given a large amount of training data, the DLM is not necessarily trained efficiently because of the increasing number of unique features. In addition, though some of the n-grams share...

متن کامل

Risk-Based Semi-Supervised Discriminative Language Modeling for Broadcast Transcription

This paper describes a new method for semi-supervised discriminative language modeling, which is designed to improve the robustness of a discriminative language model (LM) obtained from manually transcribed (labeled) data. The discriminative LM is implemented as a log-linear model, which employs a set of linguistic features derived from word or phoneme sequences. The proposed semi-supervised di...

متن کامل

Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training

The paper addresses a scheme of lightly supervised training of an acoustic model, which exploits a large amount of data with closed caption texts but not faithful transcripts. In the proposed scheme, a sequence of the closed caption text and that of the ASR hypothesis by the baseline system are aligned. Then, a set of dedicated classifiers is designed and trained to select the correct one among...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013